Robust subgroup discovery
نویسندگان
چکیده
We introduce the problem of robust subgroup discovery, i.e., finding a set interpretable descriptions subsets that 1) stand out with respect to one or more target attributes, 2) are statistically robust, and 3) non-redundant. Many attempts have been made mine either locally subgroups tackle pattern explosion, but we first address both challenges at same time from global modelling perspective. First, formulate broad model class lists, ordered sets subgroups, for univariate multivariate targets can consist nominal numeric variables, including traditional top-1 discovery in its definition. This novel allows us formalise optimal using Minimum Description Length (MDL) principle, where resort Normalised Maximum Likelihood Bayesian encodings targets, respectively. Second, lists is NP-hard. Therefore, propose SSD++, greedy heuristic finds good guarantees most significant found according MDL criterion added each iteration. In fact, gain shown be equivalent one-sample proportion, multinomial, t-test between dataset marginal distributions plus multiple hypothesis testing penalty. Furthermore, empirically show on 54 datasets SSD++ outperforms previous methods terms quality, generalisation unseen data, list size.
منابع مشابه
Subgroup Discovery
The discovery of (interesting) subgroups has a high practical relevance in all domains of science or business. For example, consider statements such as: ”the unemployment rate is above average for young men with a low educational level”, ”smokers with a positive family history are at a significantly higher risk for coronary heart disease”, or ”single males living in rural areas do rarely take o...
متن کاملSubgroup discovery
The discovery of (interesting) subgroups has a high practical relevance in all domains of science or business. For example, consider statements such as: ”the unemployment rate is above average for young men with a low educational level”, ”smokers with a positive family history are at a significantly higher risk for coronary heart disease”, or ”single males living in rural areas do rarely take o...
متن کاملSubgroup Discovery Method SUBARP
This paper summarizes the subgroup discovery method SUBARP. The discussion is to provide an intuitive understanding of the ideas underlying the method, and mathematical details are omitted.
متن کاملContrasting Subgroup Discovery
Department of Computer Science and Helsinki Institute for Information Technology HIIT, University of Helsinki, Finland Department of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia International Postgraduate School Jožef Stefan, Ljubljana, Slovenia Department of Biotechnology and Systems Biology, National Institute of Biology, Ljubljana, Slovenia Email: {laura.langohr, hannu...
متن کاملSubgroup Discovery – Advanced Review
Subgroup discovery is a broadly applicable descriptive data mining technique for identifying interesting subgroups according to some property of interest. This article summarizes fundamentals of subgroup discovery, before it reviews algorithms and further advanced methodological issues. In addition, we briefly discuss tools and applications of subgroup discovery approaches. In that context, we ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Data Mining and Knowledge Discovery
سال: 2022
ISSN: ['1573-756X', '1384-5810']
DOI: https://doi.org/10.1007/s10618-022-00856-x